Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alpha feature to output big data passing file path instead of task run name #993

Merged
merged 9 commits into from
Jul 19, 2022

Conversation

Tomcli
Copy link
Member

@Tomcli Tomcli commented Jul 7, 2022

Which issue is resolved by this Pull Request:
Resolves #987

Description of your changes:
Currently Big data passing output the task name for incoming tasks to construct the result path as its input. We can to change the output itself to a path so that the incoming tasks won't have extra logics to construct paths.

Things we found that each task has its own workspace prefix, so the workspace variable is still coded as part of the consuming path.

To enable this feature, run

export OUTPUT_BIG_DATA_PATH=true

Environment tested:

  • Python Version (use python --version):
  • Tekton Version (use tkn version):
  • Kubernetes Version (use kubectl version):
  • OS (e.g. from /etc/os-release):

Checklist:

@google-oss-prow google-oss-prow bot added size/M and removed size/L labels Jul 7, 2022
@Tomcli Tomcli changed the title [WIP] Add alpha feature to output big data passing file path instead of task run name Add alpha feature to output big data passing file path instead of task run name Jul 7, 2022
@Tomcli
Copy link
Member Author

Tomcli commented Jul 7, 2022

cc @Udiknedormin

sdk/python/kfp_tekton/compiler/_data_passing_rewriter.py Outdated Show resolved Hide resolved
appended_taskrun_path_step = _get_base_step('output-taskrun-path')
if len(appended_taskrun_path_step['command']) <= 2:
appended_taskrun_path_step['command'].append('')
appended_taskrun_path_step['command'][-1] += 'echo -n "%s/%s/%s" > $(results.%s-path.path)\n' % \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering --- could a prefix (or suffix, or format in general) for paths and values be generalized somehow? My immediate thought would be user-generated outputs and complains about not being able to use FOO (standardized into foo) and FOO_PATH (standardized into foo-path) in the same task.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I parametrize as a python global variable and renamed the suffix to -datap (short for data path). If you have a better naming idea I can update it in the global variable.

Comment on lines 687 to 688
param.get('name').endswith("-trname") or
param.get('name').endswith("-path")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using suffixes to do some magic in the compiler seems tricky. If the user creates a task with a result named e.g. asset-path, it will be interpreted as a path to asset result (which does or does not exist), is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are to remove any generated results that are pushed into the pipelineloop or generated by the custom taskspec. I think in the newer Tekton 0.37, we can specify a result type other than string. So we can also looking into using result type to specify these special results.

Copy link
Member

@yhwang yhwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Tomcli, yhwang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit e2aa560 into kubeflow:master Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Big data passing output result path instead of only the task name
3 participants